{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the dataset of labeled Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main goal of this notebook is to get all the evaluations about method coherence where\n",
    "evaluator got a (*perfect*) agreement and store those data in the Database as \n",
    "`coherence_dataset.models.Example` instances.\n",
    "\n",
    "These instances are just used to ease the operations of getting the data from the database \n",
    "to get statistics and/or training *Machine Learning* models.\n",
    "In fact, querying directly `Example` instances, allows to avoid getting (**each time**) Judges\n",
    "evaluations, getting the intersection of their agreement, and finally injecting the data \n",
    "to create the **training set**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** : this notebook assumes the use of **Python 3**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preamble: Set Django Envinronment and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load preamble_directives.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Evaluations\n",
    "\n",
    "Get methods from `SoftwareProjects` so far analysed by evaluators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<SoftwareProject: CoffeeMaker (1.0)>, <SoftwareProject: JFreeChart (0.6.0)>, <SoftwareProject: JFreeChart (0.7.1)>, <SoftwareProject: JHotDraw (7.4.1)>]\n"
     ]
    }
   ],
   "source": [
    "from source_code_analysis.models import SoftwareProject\n",
    "projects = list()\n",
    "projects.append(SoftwareProject.objects.get(name__iexact='CoffeeMaker', version__exact='1.0'))\n",
    "projects.append(SoftwareProject.objects.get(name__iexact='Jfreechart', version__exact='0.6.0'))\n",
    "projects.append(SoftwareProject.objects.get(name__iexact='Jfreechart', version__exact='0.7.1'))\n",
    "projects.append(SoftwareProject.objects.get(name__iexact='JHotDraw', version__exact='7.4.1'))\n",
    "print(projects)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Settings the environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "judges_combinations = (('leonardo.nole', 'rossella.linsalata'),\n",
    "                       ('leonardo.nole', 'antonio.petrone'),\n",
    "                       ('leonardo.nole', 'antonio.petrone'),\n",
    "                       ('leonardo.nole', 'rossella.linsalata'),)\n",
    "\n",
    "from coherence_dataset.settings import NOT_COHERENT, COHERENT\n",
    "CODES_Labels = (NOT_COHERENT, COHERENT)\n",
    "from collections import defaultdict\n",
    "stats_results = defaultdict(list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from evaluations import Judge\n",
    "from coherence_dataset.models import Example\n",
    "\n",
    "for pno, project in enumerate(projects): \n",
    "    \n",
    "    print('Processing Evaluation for ', str(project), ' Project')\n",
    "    \n",
    "    # Get Methods\n",
    "    code_methods = project.code_methods.all()\n",
    "    method_ids_map = dict() \n",
    "    for method in code_methods:\n",
    "        method_ids_map[method.id] = method\n",
    "        \n",
    "    print('Gathered ', len(method_ids_map.keys()), ' Methods')\n",
    "\n",
    "    j1_usrname, j2_usrname = judges_combinations[pno]\n",
    "    j1 = Judge(j1_usrname, project.name, project.version)\n",
    "    j2 = Judge(j2_usrname, project.name, project.version)\n",
    "    \n",
    "    #getting just NC and CO evaluations \n",
    "    j1_evals = j1.two_codes_evaluations  \n",
    "    j2_evals = j2.two_codes_evaluations\n",
    "    \n",
    "    project_stats = list()\n",
    "    for i, label in enumerate(CODES_Labels):\n",
    "        j1_evals_code = j1_evals[i]\n",
    "        j2_evals_code = j2_evals[i]\n",
    "        \n",
    "        method_ids = j1_evals_code.intersection(j2_evals_code)\n",
    "        \n",
    "        print('Gathered ', len(method_ids), \n",
    "              ' for {0} examples'.format('Positive' if i == 0 else 'Negative'))\n",
    "        \n",
    "        saved_instances_counter = 0\n",
    "        for mid in method_ids:\n",
    "            method = method_ids_map[mid]            \n",
    "            try: \n",
    "                _ = method.example\n",
    "            except Example.DoesNotExist:\n",
    "                example = Example()\n",
    "                example.method = method\n",
    "                example.target = label\n",
    "                example.save()\n",
    "                saved_instances_counter += 1\n",
    "                \n",
    "        print('Saved ', saved_instances_counter, \n",
    "              ' for {0} examples'.format('Positive' if i == 0 else 'Negative'))\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Actually verify that instances have been saved into the DB and that querying the dataset will work as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from coherence_dataset.models import Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total examples in Dataset:  2881\n"
     ]
    }
   ],
   "source": [
    "examples = Example.objects.all()\n",
    "print(\"Total examples in Dataset: \", examples.count())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from coherence_dataset.settings import COHERENT, NOT_COHERENT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Positive examples:  1713\n"
     ]
    }
   ],
   "source": [
    "print(\"Positive examples: \", examples.filter(target=COHERENT).count())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Positive examples:  1168\n"
     ]
    }
   ],
   "source": [
    "print(\"Positive examples: \", examples.filter(target=NOT_COHERENT).count())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Statistics per-Project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t\t\t Positive \t Negative\n",
      "CoffeeMaker (1.0) \t 27 \t\t 20\n",
      "JFreeChart (0.6.0) \t 406 \t\t 55\n",
      "JFreeChart (0.7.1) \t 520 \t\t 68\n",
      "JHotDraw (7.4.1) \t 760 \t\t 1025\n"
     ]
    }
   ],
   "source": [
    "print('\\t\\t\\t Positive \\t Negative')\n",
    "for project in projects:\n",
    "    data = examples.filter(method__project__id=project.id)\n",
    "    print('{0} \\t {1} \\t\\t {2}'.format(str(project),\n",
    "                                    data.filter(target=COHERENT).count(),\n",
    "                                    data.filter(target=NOT_COHERENT).count()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
